feat: add FireworksTrainingRolloutProcessor for RFT (FIR2-1351) by benjibc · Pull Request #445 · eval-protocol/python-sdk

benjibc · 2026-04-22T06:20:48Z

Description

Adds a new default RolloutProcessor subclass that drives Fireworks /v1/completions via FireworksV1CompletionsClient and surfaces the per-sample token ids, completion ids, and inference logprobs required by reinforcement-fine-tuning training loops (GRPO, CISPO, DAPO, GSPO).

Problem

SingleTurnRolloutProcessor uses LiteLLM chat completions and discards token-level data, so scored EvaluationRows are fine for evaluation but cannot feed a training loop. Today, teams that need training-ready rollouts write a bespoke RolloutProcessor — the FrozenLake example in fw-ai/cookbook is ~800 lines. This puts training-compatible rollouts out of reach of every customer evaluator bundle unless they reimplement the rollout path themselves.

Fireworks' managed RFT flow needs this for every customer job, so promoting the pattern into an Eval Protocol default removes the per-customer 800-line tax.

What it does

For each EvaluationRow, FireworksTrainingRolloutProcessor:

Reads model / temperature / max_tokens / n from completion_params.
Builds prompt token ids locally via FireworksV1CompletionsClient.build_prompt_token_ids(...).
Fires n parallel /v1/completions calls from the same prompt_token_ids, so each completion gets independent retry behaviour rather than collapsing on partial server failures.
Appends the first completion as the assistant message so existing evaluators that inspect last_assistant_message() keep scoring without modification.
Populates EvaluationRow.execution_metadata.extra with the per-completion payload:
- prompt_ids: list[int] (shared across completions)
- completion_ids: list[list[int]] (one per completion)
- inference_logprobs: list[list[float]] (aligned to completion tokens)
- completions_text: list[str]
- truncated: list[bool] (True when finish_reason == 'length')
- finish_reasons: list[str]
Merges into pre-existing extra rather than clobbering it (coexists with OpenEnvRolloutProcessor, tracing_utils, etc.).
Caches one FireworksV1CompletionsClient per model id; closes them all via acleanup().

Shape rationale

OpenEnvRolloutProcessor already writes flat prompt_ids / completion_ids concatenated across turns (multi-turn, per-episode agent rollouts). Single-turn RFT samples n>1 completions per prompt for advantage estimation and needs per-completion indexing, hence the list[list[...]] shape here. A training adapter on the consumer side can key into either convention without loss of generality.

Architecture

flowchart LR
    Row[EvaluationRow<br/>messages, tools] -->|messages → dicts| P[FireworksTrainingRolloutProcessor]
    P -->|build_prompt_token_ids| Client[FireworksV1CompletionsClient]
    Client -->|/v1/completions × n| API[Fireworks API]
    API -->|n completions<br/>with prompt_ids,<br/>completion_ids, logprobs| P
    P -->|first completion<br/>→ assistant message| OutMsgs[row.messages]
    P -->|execution_metadata.extra| Extra[prompt_ids, completion_ids,<br/>inference_logprobs, completions_text,<br/>truncated, finish_reasons]

Type of Change

New feature

Testing

8 new unit tests in tests/pytest/test_fireworks_training_rollout_processor.py, using a stub FireworksV1CompletionsClient so no network calls or HF tokenizers are required:

Per-completion extra payload has n-length lists with correct shapes and values (n=2 case).
n=1 still produces the list-of-lists shape with length 1 (not a scalar).
Trailing assistant messages are dropped by default before sampling.
Trailing assistant messages are preserved when the flag is disabled.
Missing model in completion_params raises ValueError.
n < 1 raises ValueError.
acleanup() closes every cached client.
Pre-existing keys in execution_metadata.extra are preserved across rollout.

All existing SingleTurnRolloutProcessor tests still pass.

$ python -m pytest tests/pytest/test_fireworks_training_rollout_processor.py tests/pytest/test_single_turn_rollout_processor.py
11 passed in 3.79s

Surface

New public default processor exposed via eval_protocol.pytest.FireworksTrainingRolloutProcessor.
No breaking changes to existing processors, RolloutProcessor base class, or EvaluationRow schema — all new data is carried through the existing execution_metadata.extra bag.

Follow-ups (not in this PR)

Fireworks managed RFT control plane (separate repo) will auto-select this processor for eval-v3 evaluator bundles launched from RFT jobs — tracked as FIR2-1352.
Fireworks control plane dataset-transform step will consume the execution_metadata.extra shape introduced here — tracked as FIR2-1353.
End-to-end parity test (legacy RFT path vs. this processor) on GSM8K — tracked as FIR2-1366.

A new default RolloutProcessor that drives Fireworks /v1/completions via `FireworksV1CompletionsClient` and surfaces the per-sample token-level data required by reinforcement fine-tuning training (GRPO, CISPO, DAPO, GSPO). Problem ------- The existing `SingleTurnRolloutProcessor` uses LiteLLM chat completions and discards token ids + inference logprobs, so scored `EvaluationRow`s are fine for evaluation but cannot feed a training loop. Today, teams that need training-ready rollouts write a bespoke `RolloutProcessor` (the FrozenLake example in fw-ai/cookbook is ~800 lines). This puts token ids / logprobs out of reach of every customer evaluator bundle unless they rewrite their own processor. What it does ------------ For each `EvaluationRow`, `FireworksTrainingRolloutProcessor`: * Reads model / temperature / max_tokens / n from `completion_params`. * Builds prompt token ids locally via `FireworksV1CompletionsClient". * Fires `n` parallel `/v1/completions` calls from the same `prompt_token_ids", so each completion gets independent retry behaviour rather than collapsing on partial server failures. * Appends the first completion as the assistant message so existing evaluators that inspect `last_assistant_message()" keep scoring. * Populates `EvaluationRow.execution_metadata.extra" with: - `prompt_ids: list[int]" (shared across completions) - `completion_ids: list[list[int]]" (per-completion) - `inference_logprobs: list[list[float]]" (aligned to completion tokens) - `completions_text: list[str]" - `truncated: list[bool]" (`finish_reason == 'length'") - `finish_reasons: list[str]" * Merges into pre-existing `extra" rather than clobbering it. * Caches one client per model id; closes them all via `acleanup()". Shape rationale --------------- OpenEnvRolloutProcessor already writes flat `prompt_ids" / `completion_ids" concatenated across turns (multi-turn, per-episode agent rollouts). Single-turn RFT samples n>1 completions per prompt for advantage estimation and needs per-completion indexing, hence the `list[list[...]]" shape here. The training adapter on the consumer side can key into either convention without loss of generality. Tests ----- 8 new unit tests stub `FireworksV1CompletionsClient" so no network calls or tokenizers are needed; existing `SingleTurnRolloutProcessor" suite still passes. Fixes FIR2-1351

benjibc · 2026-04-22T06:44:47Z

Closing: not needed.

Upstream review revealed that the Fireworks managed RFT cutover does not require a training-aware RolloutProcessor. The cleaner pattern (demonstrated by fw-ai/fireworks#21366 — scripts/rollr_cispo/train_cispo_cookbook.py) is to use a plain RemoteRolloutProcessor for the rollout/eval boundary and reconstruct the training-relevant data trainer-side via:

local tokenization using FireworksV1CompletionsClient.build_prompt_token_ids / build_assistant_turn_token_ids
a prefill-logprobs pass (echo=true) against the same inference deployment the trainer uses

That pattern works for CISPO (and by extension GRPO/DAPO/GSPO) today, and it keeps eval-protocol's rollout contract single-purpose. No reason to push a training-specific default into EP.

No harm done — the module is self-contained and not wired into anything else.

benjibc closed this Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add FireworksTrainingRolloutProcessor for RFT (FIR2-1351)#445

feat: add FireworksTrainingRolloutProcessor for RFT (FIR2-1351)#445
benjibc wants to merge 1 commit intomainfrom
cursor/managed-rl-training-rollout-processor-135f

benjibc commented Apr 22, 2026

Uh oh!

benjibc commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benjibc commented Apr 22, 2026

Description

Problem

What it does

Shape rationale

Architecture

Type of Change

Testing

Surface

Follow-ups (not in this PR)

Uh oh!

benjibc commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant